Skip to content

Dockerfile - Add ROCm6.3 dockerfile#790

Open
polarG wants to merge 9 commits intomainfrom
dev/hongtaozhang/add-dockerfile-rocm-6.3.x
Open

Dockerfile - Add ROCm6.3 dockerfile#790
polarG wants to merge 9 commits intomainfrom
dev/hongtaozhang/add-dockerfile-rocm-6.3.x

Conversation

@polarG
Copy link
Copy Markdown
Contributor

@polarG polarG commented Mar 16, 2026

Description
Add ROCm6.3 dockerfile.

Copilot AI review requested due to automatic review settings March 16, 2026 05:01
@polarG polarG requested a review from a team as a code owner March 16, 2026 05:01
@polarG polarG self-assigned this Mar 16, 2026
@polarG polarG requested review from abuccts and guoshzhao March 16, 2026 05:02
@polarG polarG added ROCm enhancement New feature or request labels Mar 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ROCm 6.3 container recipe to the repo’s dockerfile/ collection, targeting the rocm/pytorch-training:v25.6 base image and layering SuperBench build/install steps plus common tooling needed for benchmarks.

Changes:

  • Introduces dockerfile/rocm6.3.x.dockerfile for a ROCm 6.3.4 + PyTorch training base image.
  • Installs additional system tools (Docker client, OFED if missing, Intel MLC) and configures SSH/limits.
  • Builds SuperBench third-party dependencies and installs the package with AMD worker extras.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.69%. Comparing base (700d650) to head (2a6698f).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #790   +/-   ##
=======================================
  Coverage   85.69%   85.69%           
=======================================
  Files         103      103           
  Lines        7890     7890           
=======================================
  Hits         6761     6761           
  Misses       1129     1129           
Flag Coverage Δ
cpu-python3.10-unit-test 70.42% <ø> (ø)
cpu-python3.12-unit-test 70.42% <ø> (ø)
cpu-python3.7-unit-test 69.85% <ø> (ø)
cuda-unit-test 83.60% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings March 16, 2026 18:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ROCm 6.3 build target to the repository’s Docker image build pipeline, enabling CI to produce a superbench/main:rocm6.3 image variant.

Changes:

  • Introduces dockerfile/rocm6.3.x.dockerfile based on rocm/pytorch-training:v25.6, with additional system deps, Docker CLI, OFED (conditional), and SuperBench build/install steps.
  • Updates .github/workflows/build-image.yml to build and tag the new ROCm 6.3 image on the self-hosted ROCm runner.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
dockerfile/rocm6.3.x.dockerfile New ROCm 6.3 Dockerfile building SuperBench on top of rocm/pytorch-training:v25.6.
.github/workflows/build-image.yml Adds a rocm6.3 entry to the build matrix to produce/push superbench/main:rocm6.3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dockerfile/rocm6.3.x.dockerfile
Comment thread dockerfile/rocm6.3.x.dockerfile
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Copilot AI review requested due to automatic review settings March 25, 2026 22:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ROCm 6.3 container build path to the repo, and updates third-party build logic to support explicit ROCm GPU arch selection during rccl-tests compilation.

Changes:

  • Add dockerfile/rocm6.3.x.dockerfile based on rocm/pytorch-training:v25.6 and install additional tooling (Docker client, OFED, MLC).
  • Update third_party/Makefile to optionally build rccl-tests with explicit --offload-arch flags when AMDGPU_TARGETS is provided.
  • Update the GitHub Actions image build matrix to include a rocm6.3 build.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
third_party/Makefile Adds conditional arch-aware rccl-tests build flags driven by AMDGPU_TARGETS.
dockerfile/rocm6.3.x.dockerfile Introduces a new ROCm 6.3 image definition and wires in third-party builds and package installs.
.github/workflows/build-image.yml Adds the new rocm6.3 image to CI build/push matrix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread third_party/Makefile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Comment thread dockerfile/rocm6.3.x.dockerfile Outdated
Hongtao Zhang and others added 2 commits April 21, 2026 22:48
- Pin botocore/boto3 to 1.35.98 for reproducible builds
- Remove unused ARG NUM_MAKE_JOBS and corresponding CI build_args
- Derive UBUNTU_VERSION dynamically via lsb_release inside OFED RUN block
- Fix OFED comment to match actual logic
- Switch OFED download from HTTP to HTTPS
- Split setuptools install into separate RUN layers to avoid masking failures
- Add rm -rf .git after build to reduce image size
- Change ifdef to ifneq for AMDGPU_TARGETS non-empty check in Makefile
Copilot AI review requested due to automatic review settings April 21, 2026 22:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dockerfile/rocm6.3.x.dockerfile
polarG and others added 2 commits April 27, 2026 14:43
- Set strategy.fail-fast: false in build-image.yml so a transient ROCm
  self-hosted runner failure does not abort sibling CUDA image builds.
- Promote AMDGPU_TARGETS to a build ARG so it can be overridden via
  --build-arg at docker build time (e.g., to add gfx950 for newer cards).
- Add a comment documenting that RCCL is intentionally taken from the
  base image (no custom build / LD_PRELOAD) for ROCm 6.3.
Copilot AI review requested due to automatic review settings May 3, 2026 04:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +18
# - cmake: 3.18.5
# - rocm-cmake: 0.14.0.60304-76
# - amd-smi: 25.1.0+8dc45db

ADD third_party third_party

RUN make RCCL_HOME=/opt/rocm ROCBLAS_BRANCH=release-staging/rocm-rel-6.3 HIPBLASLT_BRANCH=release-staging/rocm-rel-6.3 ROCM_VER=rocm-5.5.0 -C third_party rocm -o cpu_hpl -o cpu_stream -o megatron_lm -o rocm_megatron_lm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants